-
-
Notifications
You must be signed in to change notification settings - Fork 11.7k
Granite4 Quantization Bug Fix #28398
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Granite4 Quantization Bug Fix #28398
Conversation
|
👋 Hi! Thank you for contributing to the vLLM project. 💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels. Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run You ask your reviewers to trigger select CI tests on top of Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging. To run CI, PR reviewers can either: Add If you have any questions, please reach out to us on Slack at https://slack.vllm.ai. 🚀 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request correctly addresses a quantization bug for the GraniteMoeHybrid model by remapping module names. The approach of using regex to update the quantization configuration is sound.
I've identified a critical bug in the implementation related to incorrect indentation, which could lead to crashes or silent errors in quantization. I've also suggested a refactoring that fixes this bug and improves the code's maintainability and efficiency by removing duplication and pre-compiling regex patterns. Please see the detailed comment.
| remapping_rules = [ | ||
| # Granite model: mamba -> mixer remapping | ||
| ( | ||
| r"model\.layers\.(\d+)\.mamba\.in_proj", | ||
| r"model.layers.\1.mixer.in_proj", | ||
| ), | ||
| ( | ||
| r"model\.layers\.(\d+)\.mamba\.out_proj", | ||
| r"model.layers.\1.mixer.out_proj", | ||
| ), | ||
| ] | ||
| # Update ignore list | ||
| if hasattr(quant_config, "ignore"): | ||
| updated_ignore = [] | ||
| for name in quant_config.ignore: | ||
| updated_name = name | ||
| for pattern, repl in remapping_rules: | ||
| if re.fullmatch(pattern, name): | ||
| updated_name = re.sub(pattern, repl, name) | ||
| updated_ignore.append(updated_name) | ||
| quant_config.ignore = updated_ignore | ||
| # Update target list | ||
| if hasattr(quant_config, "config_groups"): | ||
| config_groups = quant_config.config_groups | ||
| for group_name in config_groups: | ||
| if "targets" in config_groups[group_name]: | ||
| targets = [] | ||
| for name in config_groups[group_name]["targets"]: | ||
| updated_name = name | ||
| for pattern, repl in remapping_rules: | ||
| if re.fullmatch(pattern, name): | ||
| updated_name = re.sub(pattern, repl, name) | ||
| targets.append(updated_name) | ||
| config_groups[group_name]["targets"] = targets | ||
| quant_config.config_groups = config_groups | ||
| return quant_config |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This section has a critical bug and can be refactored for better maintainability and efficiency.
Bug:
The line config_groups[group_name]["targets"] = targets is incorrectly indented. If a config group lacks a "targets" key, it will either be assigned the targets from a previous group or raise an UnboundLocalError. This can cause incorrect quantization or crashes.
Refactoring:
The logic for remapping module names is duplicated for the ignore list and config_groups targets. This can be extracted into a nested helper function. Additionally, pre-compiling the regex patterns improves performance.
The suggested change below fixes the bug and refactors the code for clarity and efficiency.
remapping_rules = [
# Granite model: mamba -> mixer remapping
(
re.compile(r"model\.layers\.(\d+)\.mamba\.in_proj"),
r"model.layers.\1.mixer.in_proj",
),
(
re.compile(r"model\.layers\.(\d+)\.mamba\.out_proj"),
r"model.layers.\1.mixer.out_proj",
),
]
def _remap_names(names: list[str]) -> list[str]:
"""Helper to remap a list of names using regex rules."""
remapped_names = []
for name in names:
updated_name = name
for pattern, repl in remapping_rules:
if pattern.fullmatch(name):
updated_name = pattern.sub(repl, name)
# Assuming a name can only match one rule.
break
remapped_names.append(updated_name)
return remapped_names
# Update ignore list
if hasattr(quant_config, "ignore"):
quant_config.ignore = _remap_names(quant_config.ignore)
# Update target list
if hasattr(quant_config, "config_groups"):
for group_config in quant_config.config_groups.values():
if "targets" in group_config:
group_config["targets"] = _remap_names(
group_config["targets"])
return quant_configThere was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
💡 Codex Review
Here are some automated review suggestions for this pull request.
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
| if re.fullmatch(pattern, name): | ||
| updated_name = re.sub(pattern, repl, name) | ||
| targets.append(updated_name) | ||
| config_groups[group_name]["targets"] = targets |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Guard
config_groups assignment behind target check
The config_groups[group_name]["targets"] = targets assignment is aligned with the for group_name loop rather than the preceding if "targets" in config_groups[group_name] block. When a quantization config contains a group that lacks a "targets" key (which is common for groups that only tweak global parameters), the code still executes this assignment and references the local variable targets that was never initialised, raising an UnboundLocalError during model construction. The assignment should live inside the if block so groups without targets are skipped.
Useful? React with 👍 / 👎.
|
CC @tdoublep |
|
Seems the root issue is just because vllm/vllm/model_executor/models/granitemoehybrid.py Lines 68 to 84 in 91864b7
I think we just need to correct it instead of adding a hacky patch. 😅 |
|
Hi @mgoin @Isotr0py @heheda12345, I updated the PR. I removed the may_be_quant() function and updated the load_weights function to make the base model and the quantized model are loaded. The llm-compressor PR (for quantized granite4 model) is here: vllm-project/llm-compressor#2001 |
Pull Request Description
Purpose
This PR adds quantization configuration remapping support for the GraniteMoeHybrid model to handle the naming difference between Hugging Face model checkpoints and vLLM's internal model structure.
Problem: The GraniteMoeHybrid model uses
mambanaming in HF checkpoints butmixernaming in vLLM's implementation (inherited from MambaMixer2). This mismatch causes quantization configurations to fail when targeting specific layers, as the ignore lists and target module names don't match the actual vLLM model structure.Solution: Implements a
maybe_update_quant_config()method that remaps quantization config module names from HF format (mamba) to vLLM format (mixer) using regex patterns. This ensures quantization configurations work correctly with the model's actual layer names.Changes made:
maybe_update_quant_config()method toGraniteMoeHybridForCausalLMclassimport restatement at the top of the fileignorelists andconfig_groups.targetslists in the quantization configmodel.layers.{N}.mamba.in_proj→model.layers.{N}.mixer.in_projTest Plan
1. Integration Test with FP8 Quantization
# Test with FP8 quantization targeting specific layers python -m vllm.entrypoints.openai.api_server \ --model RedHatAI/granite-4.0-h-small-FP8-block \ --max-model-len 2048Test Result
Before the fix:
model.layers.{N}.mamba.*patterns would not match any layersAfter the fix:
mambatomixerignorelists andconfig_groups.targetsare updated correctlyAdditional Notes
Essential Elements of an Effective PR Description Checklist
supported_models.mdandexamplesfor a new model.